News Projects Resume Contact

VoiceBand

What is it?
VoiceBand screenshot
From top to bottom:
  • The time data (oscilloscope) of the incoming sound
  • The short time frequency analysis of the incoming sound
  • A piano keyboard, with the red line being the current fundamental frequency of the incoming sound. The exact frequency is printed just below the keyboard
  • A visual profiling tool for the multithreaded job system. The green block is how much time is spent on detecting the pitch, the red blocks are the pitch shifting jobs. The white jobs are the other jobs (e.g. the last white block on the first row is the output job). There is a row for each available core, and the screenshot is taken on a dual core machine, so there are two rows in total

VoiceBand is my general purpose audio processing research application, which is still very much under construction

Currently, it can do the following things, in realtime with low latency:
  • Record audio from a microphone
  • Playback wave files
  • Add Echo
  • Perform a Low-pass filter (and high pass and band pass by extension)
  • Detect pitch of a single voice (singing or monophonic instruments), using this method
  • Change pitch without changing duration (using this free phase vocoder, improvements are planned in the future)
  • Take input from a midi keyboard
  • Mix multiple audio blocks together
  • Compress clipping audio
  • Playback processed audio
  • Save processed audio to a wave file
Since these are literally all just building blocks (nodes) in the software architecture, they can be combined into a processing graph to create interesting programs. Because of I use a graph system with nodes that are dependent on previous node, nodes that can be executed at the same time will be scheduled on multiple processors where available, allowing for more efficient use of multiprocessor machines, automatically.

Currently, the graphs need to be created programmatically, but in the future, I intend to create a UI that lets you drag and connect nodes to quickly setup and modify various processing graphs.
Example (pitch corrected singing):

Example1

With this graph, the input will be pitch-corrected to the nearest half-tone on the well-tempered scale. Since each node is dependent on the completion of one or more previous nodes, no parallelisation takes place.

This is what the current version sounds like (singing recorded on non-professional equipment, so excuse the noise):
Original: Hallelujah.mp3
With added background vocals: output1.mp3

It's not perfect, but as a proof on concept, it will do for now
Example (midi keyboard controlled background voices added):

Example2

With this graph, voices will be added to your own singing, based on which keys are pressed on a midi keyboard
Because of the dependencies and structure, the three pitch changing nodes (easily the most CPU intensive part of this graph) will be executed on 3 cores (if available) at the same time.

This is what the current version sounds like (singing recorded on non-professional equipment, so excuse the noise):
Original: Hallelujah.mp3
With added background vocals: output_Prototype2.mp3

Again, it's not perfect, but it will do as a starting point for further research

Future plans
Things I'd like to do in the future with this framework:
  • Use a time-based pitch changing algorithm (PSOLA or similar) instead of the phase vocoder I use now
  • Do proper formant correction/restoration when changing pitch
  • Work on making the backing vocals sound more human (introduce slight pitch wavering, volume changes, time-offset changes)
  • Work on making the backing vocals sound different (modify the formants and other characteristics so they sound like different voices)
  • Add a node to add automatic vibrato to the input
  • Add a UI so I can drag and visually connect nodes more easily
  • Introduce parallelism within CPU-heavy nodes so more efficient use of multicore machines is possible
  • ...